Japanese Dialogue Corpus of Multi-Level Annotation
نویسنده
چکیده
This paper describes a Japanese dialogue corpus annotated with multi-level information built by the Japanese Discourse Research Initiative, Japanese Society for Artificial Intelligence. The annotation information consists of speech, transcription delimited by slash units, prosodic, part of speech, dialogue acts and dialogue segmentation. In the project, we used the corpus for obtaining new findings by examining the relationship between linguistic information and dialogue acts, that between prosodic information and dialogue segment, and the characteristics of agreement/disagreement expressions and non-sentence elements. 1 I n t r o d u c t i o n This paper describes a Japanese dialogue corpus annotated with multi-level information such as speech, linguistic and discourse information built by the Japanese Discourse Research Initiative, supported by Japanese Society for Artificial Intelligence. Dialogue corpora are now indispensable to speech and language research communities. • The corpora have been used not only for examining the relationship between speech and linguistic phenomena, but also for building • speech and language understanding systems. Sharing corpora among researchers is most desirable since creating the corpora needs considerable cost like writing and revising annotation manuals, annotating the data, and checking the consistency and reliability of the annotated data. Discourse Research Initiative was set up in March of 1996 by US, European, and Japanese researchers to develop standardized discourse annotation schemes (Carletta et al., 1997; Core et al., 1998). The efforts of the initiative have been called 'standardization', but this naming is misleading at least. In typical standardizing efforts, as done in audio-visual and telecommunication technologies, commercial companies try to expand the market for their products or interfaces by the standard. The objective of standardizing efforts in discourse is to promote interactions among discourse researchers and thereby provide a solid foundation for corpus-based discourse research, dispensing with duplicating resource making efforts and increasing sharable resources. In cooperation with this initiative, Japanese Discourse Research Initiative has started in Japan in May 1996, supported by Japanese Society for Artificial Intelligence (JDRI, 1996; Ichikawa et al., 1999). The activities of the initiative involve: creating and revising annotation schemes based on the survey of the existing schemes and annotation experiments, annotating corpora based on the proposed annotation schemes, and doing research using the corpora not only for examining the utility of the schemes and corpora but also for obtaining new findings.
منابع مشابه
Standoff Coordination for Multi-Tool Annotation in a Dialogue Corpus
The LUNA corpus is a multi-lingual, multidomain spoken dialogue corpus currently under development that will be used to develop a robust natural spoken language understanding toolkit for multilingual dialogue services. The LUNA corpus will be annotated at multiple levels to include annotations of syntactic, semantic, and discourse information; specialized annotation tools will be used for the a...
متن کاملCo-reference annotation and resources: A multilingual corpus of typologically diverse languages
This article introduces a dialogue corpus containing data from two typologically different languages, Japanese and Kilivila. The corpus is annotated in accordance with language specific annotation schemes for co-referential and similar relations. The article describes the corpus data, the properties of language specific co-reference in the two languages and a methodology for its annotation. Exa...
متن کاملA corpus for studying addressing behavior in multi-party dialogues
This paper describes a multi-modal corpus of hand-annotated meeting dialogues that was designed for studying addressing behavior in face-to-face conversations. The corpus contains annotated dialogue acts, addressees, adjacency pairs and gaze direction. First, we describe the corpus design where we present the annotation schema, annotation tools and annotation process itself. Then, we analyze th...
متن کاملEvaluation of Transcription and Annotation Tools for a Multi-modal, Multi-party Dialogue Corpus
This paper reviews nine available transcription and annotation tools, considering in particular the special difficulties arising from transcribing and annotating multi-party, multi-modal dialogue. Tools are evaluated as to the ability to support the user’s annotation scheme, ability to visualize the form of the data, compatibility with other tools, flexibility of data representation, and genera...
متن کاملCreating spoken dialogue characters from corpora without annotations
Virtual humans are being used in a number of applications, including simulation-based training, multi-player games, and museum kiosks. Natural language dialogue capabilities are an essential part of their human-like persona. These dialogue systems have a goal of being believable and generally have to operate within the bounds of their restricted domains. Most dialogue systems operate on a dialo...
متن کامل